Search CORE

8 research outputs found

Domain Adaptation for Statistical Classifiers

Author: Daume III H.
Marcu D.
Publication venue: 'AI Access Foundation'
Publication date: 28/09/2011
Field of study

The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the "in-domain" test data is drawn from a distribution that is related, but not identical, to the "out-of-domain" distribution of the training data. We consider the common case in which labeled out-of-domain data is plentiful, but labeled in-domain data is scarce. We introduce a statistical formulation of this problem in terms of a simple mixture model and present an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts. We present efficient inference algorithms for this special case based on the technique of conditional expectation maximization. Our experimental results show that our approach leads to improved performance on three real world tasks on four different data sets from the natural language processing domain

arXiv.org e-Print Archive

Crossref

Domain Adaptation for Resume Classification Using Convolutional Neural Networks

Author: E Tutubalina
H Daume III
LVD Maaten
S Ben-David
ST Al-Otaibi
W Hong
Y Ganin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/07/2017
Field of study

We propose a novel method for classifying resume data of job applicants into 27 different job categories using convolutional neural networks. Since resume data is costly and hard to obtain due to its sensitive nature, we use domain adaptation. In particular, we train a classifier on a large number of freely available job description snippets and then use it to classify resume data. We empirically verify a reasonable classification performance of our approach despite having only a small amount of labeled resume data available.Peer reviewe

arXiv.org e-Print Archive

Crossref

Aaltodoc Publication Archive

Improving Transfer Learning by Introspective Reasoner

Author: A. Argyriou
B. Bakker
D.B. Leake
G. Kuhlmann
H. Daume III
L. Birnbaum
M. Cox
M.T. Cox
Q. Dong
S. Craw
S.J. Pan
T. Croonenborghs
W. Cheetham
W.Y. Dai
X. Ling
Z. Fuzhen
Z. Shi
Z. Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Evaluation of domain adaptation approaches for robust classification of heterogeneous biological data sets.

Author: AA Alizadeh
Burkhard Rost
C Bernau
C Cortes
C Widmer
F Buggenthin
H Daume III
HY Xiong
JT Leek
L Breiman
L Jacob
M Held
M Helmstaedter
M Sokolova
SJ Pan
T Blasi
VM Patel
X Wang
Y Ganin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Most machine learning algorithms require that training data are identically distributed to ensure effective learning. In biological studies, however, even small variations in the experimental setup can lead to substantial deviations. Domain adaptation offers tools to deal with this problem. It is particularly useful for cases where only a small amount of training data is available in the domain of interest, while a large amount of training data is available in a different, but relevant domain. We investigated to what extent domain adaptation was able to improve prediction accuracy for complex biological data. To that end, we used simulated data and time-lapse movies of differentiating blood stem cells in different cell cycle stages from multiple experiments and compared three commonly used domain adaptation approaches. EasyAdapt, a simple technique of structured pooling of related data sets, was able to improve accuracy when classifying the simulated data and cell cycle stages from microscopic images. Meanwhile, the technique proved robust to the potential negative impact on the classification accuracy that is common in other techniques that build models with heterogeneous data. Despite its implementation simplicity, EasyAdapt consistently produced more accurate predictions compared to conventional techniques. Domain adaptation is therefore able to substantially reduce the amount of work required to create a large amount of annotated training data in the domain of interest necessary whenever the domain changes even a little, which is common not only in biological experiments, but universally exists in almost all data collection routines

Crossref

PuSH